Enlarging Monolingual Dictionaries for Machine Translation with Active Learning and Non-Expert Users
نویسندگان
چکیده
This paper explores a new approach to help non-expert users with no background in linguistics to add new words to a monolingual dictionary in a rule-based machine translation system. Our method aims at choosing the correct paradigm which explains not only the particular surface form introduced by the user, but also the rest of inflected forms of the word. A large monolingual corpus is used to extract an initial set of potential paradigms, which are then interactively refined by the user through active machine learning. We show the results of experiments performed on a Spanish monolingual dictionary.
منابع مشابه
Multimodal Building of Monolingual Dictionaries for Machine Translation by Non-Expert Users
This paper explores a new approach to help non-expert users with no background in linguistics to add new words to a monolingual dictionary in a rule-based machine translation system. Our method aims at obtaining the correct paradigm which explains not only the particular surface form introduced by the user, but also the rest of inflected forms of the word. An initial set of potential paradigms ...
متن کاملSource-Language Dictionaries Help Non-Expert Users to Enlarge Target-Language Dictionaries for Machine Translation
In this paper, a previous work on the enlargement of monolingual dictionaries of rule-based machine translation systems by non-expert users is extended to tackle the complete task of adding both source-language and target-language words to the monolingual dictionaries and the bilingual dictionary. In the original method, users validate whether some suffix variations of the word to be inserted a...
متن کاملExploiting Aggregate Properties of Bilingual Dictionaries For Distinguishing Senses of English Words and Inducing English Sense Clusters
We propose a novel method for inducing monolingual semantic hierarchies and sense clusters from numerous foreign-language-to-English bilingual dictionaries. The method exploits patterns of non-transitivity in translations across multiple languages. No complex or hierarchical structure is assumed or used in the input dictionaries: each is initially parsed into the “lowest common denominator” for...
متن کاملExploiting Similarities among Languages for Machine Translation
Dictionaries and phrase tables are the basis of modern statistical machine translation systems. This paper develops a method that can automate the process of generating and extending dictionaries and phrase tables. Our method can translate missing word and phrase entries by learning language structures based on large monolingual data and mapping between languages from small bilingual data. It u...
متن کاملGeneration of Bilingual Dictionaries using Comparable and Quasi Comparable Corpora
The amount of information available on the web is increasing rapidly. The number of internet users is also increasing every day. A significant section of internet users is monolingual. They want to express themselves in their native language and also seeking information in the same. Hence, multilingual content over the internet is also increasing at a rapid pace. There is a need of systems whic...
متن کامل